Accuracy of Probabilistic Linkage Using the Enhanced Matching System for Public Health and Epidemiological Studies

نویسندگان

  • Robert W. Aldridge
  • Kunju Shaji
  • Andrew C. Hayward
  • Ibrahim Abubakar
  • Antonio Guilherme Pacheco
چکیده

BACKGROUND The Enhanced Matching System (EMS) is a probabilistic record linkage program developed by the tuberculosis section at Public Health England to match data for individuals across two datasets. This paper outlines how EMS works and investigates its accuracy for linkage across public health datasets. METHODS EMS is a configurable Microsoft SQL Server database program. To examine the accuracy of EMS, two public health databases were matched using National Health Service (NHS) numbers as a gold standard unique identifier. Probabilistic linkage was then performed on the same two datasets without inclusion of NHS number. Sensitivity analyses were carried out to examine the effect of varying matching process parameters. RESULTS Exact matching using NHS number between two datasets (containing 5931 and 1759 records) identified 1071 matched pairs. EMS probabilistic linkage identified 1068 record pairs. The sensitivity of probabilistic linkage was calculated as 99.5% (95%CI: 98.9, 99.8), specificity 100.0% (95%CI: 99.9, 100.0), positive predictive value 99.8% (95%CI: 99.3, 100.0), and negative predictive value 99.9% (95%CI: 99.8, 100.0). Probabilistic matching was most accurate when including address variables and using the automatically generated threshold for determining links with manual review. CONCLUSION With the establishment of national electronic datasets across health and social care, EMS enables previously unanswerable research questions to be tackled with confidence in the accuracy of the linkage process. In scenarios where a small sample is being matched into a very large database (such as national records of hospital attendance) then, compared to results presented in this analysis, the positive predictive value or sensitivity may drop according to the prevalence of matches between databases. Despite this possible limitation, probabilistic linkage has great potential to be used where exact matching using a common identifier is not possible, including in low-income settings, and for vulnerable groups such as homeless populations, where the absence of unique identifiers and lower data quality has historically hindered the ability to identify individuals across datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Need for a Strong Public-Private Linkage in Agricultural Extension System (Case Study: Sari Township, Iran)

Relationship between public and private sector is becoming an increasingly important issue in management of agricultural extension services. The need for a strong linkage could be identified as the gap between desirable and current situation. In this research, the differences among current and desirable situation in six diverse dimensions was calculated. The current and desirable situation was ...

متن کامل

ارائه یک مدل احتمالاتی جهت تعیین انسجام متن در سیستم های پرسش و پاسخ تعاملی

Evaluation plays an important role in interactive question answering systems like many computational linguistics fields. The coherence between the questions and the answers exchanged between the user and the system is one of the important criteria in evaluating these systems. In this paper, a new approach to determine the degree of coherence of generated text by the IQA systems is presented. Th...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data

background The pseudonymisation algorithm used to link together episodes of care belonging to the same patient in England [Hospital Episode Statistics ID (HESID)] has never undergone any formal evaluation to determine the extent of data linkage error. Objective To quantify improvements in linkage accuracy from adding probabilistic linkage to existing deterministic HESID algorithms. Methods Inpa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2015